Unraveling the contributions of prosodic patterns and individual traits on cross-linguistic perception of Spanish sentence modality

Cross-linguistic perception is known to be molded by native and second language (L2) experiences. Yet, the role of prosodic patterns and individual characteristics on how speakers of tonal languages perceive L2 Spanish sentence modalities remains relatively underexplored. This study addresses the gap by analyzing the auditory performance of 75 Mandarin speakers with varying levels of Spanish proficiency. The experiment consisted of four parts: the first three collected sociolinguistic profiles and assessed participants’ pragmatic competence and musical abilities. The last part involved an auditory gating task, where participants were asked to identify Spanish broad focus statements and information-seeking yes/no questions with different stress patterns. Results indicated that the shape of intonation contours and the position of the final stressed syllable significantly impact learners’ perceptual accuracy, with effects modulated by utterance length and L2 proficiency. Moreover, individual differences in pragmatic and musical competence were found to refine auditory and cognitive processing in Mandarin learners, thereby influencing their ability to discriminate question-statement contrasts. These findings reveal the complex interplay between prosodic and individual variations in L2 speech perception, providing novel insights into how speakers of tonal languages process intonation in a non-native Romance language like Spanish.


Introduction
The utilization of fundamental frequency (f0) differs between tonal and intonational languages, with tonal languages primarily using f0 modulation to convey lexical meanings, while intonational languages employ it to communicate postlexical information.For learners whose first language (L1) is tonal, such as Mandarin Chinese, acquiring an intonational language like Spanish as a second language (L2) often presents significant challenges due to these disparities [1][2][3].Despite the extensive research on L2 perceptual acquisition, focusing on various speech components such as segments [4], stress [5,6], prominence [7,8], and lexical tones [9], limited attention has been given to the perception of sentence-type contrasts in contact situations involving tonal and non-tonal languages.Currently, most of the scientific literature examining L2 sentence perception by tonal language speakers has predominantly centered on Mandarin learners of English, mainly attributing cross-linguistic discrepancies to the influence of the learners' L1 [10][11][12][13].However, recent findings suggest that the ability of L2 listeners to process segmental and suprasegmental features of speech may be influenced by various interindividual factors, including biological characteristics [4,14,15], psychometric traits [7,16], and music expertise [17,18].On the other hand, previous acoustic analyses of Spanish intonation have provided evidence of distinct intonation contours that categorically differentiate statements from questions, thereby offering prosodic cues that can enhance native listeners' perceptual accuracy [19].Additionally, other prosodic-related elements, such as the location of the final stressed syllable, have been theorized to impact the production of Spanish intonation by altering the trajectory of pitch movements [20].Although the knowledge from previous works has contributed greatly to the current understanding of L2 speech perception, it remains unclear how these prosodic and individual factors interact and affect the auditory processing of L2 sentence modalities in Spanish for speakers of tonal languages.
The present study aims to address this research gap, based on the expanding importance of multilingualism in today's globalized society, where individuals frequently learn an L2 that differs greatly from their L1 in phonetic and phonological dimensions.Specifically, this study focuses on examining how (a) prosodic elements, including intonation pattern and stress location, and (b) individual traits such as age, gender, pragmatic skills, and musical ability, impact the perception of Spanish sentence modalities by native speakers of Mandarin Chinese who have acquired Peninsular Spanish as an L2.By doing so, this study aims to unravel the complexities of L2 auditory processing at the sentence intonation level and potentially contribute to future design of more effective instructional materials and curricula for L2 Spanish learners from tonal languages.The remainder of the paper is structured as follows.The Introduction section reviews the relevant literature on L2 perception and outlines the research questions of the study.The Methods section describes the methodology employed in this study.The Results section presents the statistical results, followed by a discussion of the findings in the Discussion section.Finally, the paper concludes in the Conclusions section.

Prosodic cues influencing Spanish sentence perception
In Peninsular Spanish, distinguishing between statements and yes/no questions can be challenging due to similarities in word choice and grammar, prompting heavy reliance on intonation contours for discerning intended linguistic meanings [21][22][23].Prior phonetic analyses have evidenced four intonation patterns categorically marking declarative versus interrogative modality in Spanish [19,24].The most salient is pitch movement at the utterance end, with final rises signaling questions and final falls indicating statements.This pattern can be recognized cross-linguistically by both native Spanish speakers and non-native speakers from various L1 backgrounds.For example, learners of Spanish with both Chinese and English as their L1 were found to be able to distinguish statements from yes/no questions based solely on the final rising or falling intonation contour [25,26], suggesting potential prosodic universality for conveying sentence type.This may be determined by biological codes, e.g., the Frequency Code, which associates higher f0 with uncertainty, thereby serving as an interrogative marker, and lower f0 with assertiveness, marking declarative modality [27,28].
While the final pitch contour strongly conveys the intonation category, early pitch movements in the prenuclear position also contribute to sentence type identification.Prior research found that the broad focus statement in Spanish exhibits an initial f0 rise on stressed syllables, with the first f0 peak occurring poststress.After the last prenuclear syllable, the prosodic contour down steps progressively until the utterance end.By contrast, the yes/no question demonstrates higher initial peaks aligned with stressed syllable offsets [29,30].The observed differences in early pitch movements are reported to play a crucial role in shaping the perception of sentence types in Spanish [19,31], as well as in other Romance languages such as Italian [32].An auditory gating experiment conducted with native Spanish speakers, for instance, demonstrated that participants were able to accurately identify the sentence type with a 95% accuracy rate after being exposed to just the contour that included the initial f0 peak [31].In addition to the previous cues, two less commonly observed intonation patterns also contribute to this perceptual process, albeit to a lesser extent.One is f0 movement in the medial position of the sentence, characterized by a rising f0 pattern in statements that consist of more than two stressed words, a feature not present in yes/no questions.The other is the f0 contour observed during the final stressed syllable, where statements typically present exhibit a slight rise in f0, while yes/no questions tend to show an f0 dip [19,31].Despite these observations, the impact of these particular intonation patterns on L2 learners' perception of sentence modality remains a less explored area.Future research is needed to delve deeper into understanding how these subtleties in intonation might influence native tonal language speakers' interpretation of sentence modality in Spanish and other Romance languages.
Furthermore, the positioning of the final stressed syllable may modulate Spanish intonation processing by adjusting the shape of the final pitch movement [33,34].Prior research indicates that rising-falling patterns are particularly susceptible to the placement of the final stressed syllable [20].Nevertheless, it remains unclear whether variations in utterance-final word stress affect the perception of Spanish intonation by native and non-native listeners.Experimental studies focusing on this issue found that Mandarin learners of Spanish struggled to recognize words with stress on the last syllable compared to those with stress on the penultimate syllable, potentially due to conflicting functions of f0 in the final position of oxytone words [35,36].Yet this study examined L2 learners' sensitivity to continuous changes in three acoustic dimensions (f0, duration, and intensity) of intonation.Further research is therefore needed to investigate potential impacts of stress patterns on categorical perception of Spanish intonation contours vital for signaling sentence modality in both native and non-native speakers.

Interspeaker variations in L2 perceptual processing
Although speakers within the same linguistic community often exhibit consistent behaviors, significant individual variations can still be observed in processing complex linguistic categories [37,38].The impact of speaker-specific factors on speech perception has been interpreted in many ways over time, with numerous dimensions of individual differences being investigated.For instance, research on the relationship between L2 proficiency and speech production/perception has shown mixed results.While some studies have found higher proficiency level correlates with better phonetic and phonological performance in L2 speech production [26,[39][40][41], the link between proficiency and L2 perception is less clear.One study revealed no significant correlation between Mandarin learners' L2 proficiency and perceptual accuracy of Spanish questions [25].However, other research indicates that English learners improve their perception of Spanish prenuclear pitch alignment approaching native-like performance as L2 proficiency increases [26].High-proficiency adults also appear more accurate at between-category discrimination of non-native novel sounds, potentially due to superior ability to use higher-level cognitive processing like attention to perceive target phonemic boundaries [42,43].
On the other side, early L2 acquisition is believed to enhance perception yielding more target-like performance versus adult acquisition [4,14,42,43].For example, early Spanish-English bilinguals show higher accuracy than late bilinguals in categorical discrimination of English vowels [4].Although some studies have found no significant early versus late bilingual perceptual differences, neural processing distinctions have been observed in L2 sounds, suggesting that the age of acquisition may affect the activation of brain regions necessary for auditory processing [42].However, compared to the age of acquisition, the effect of listeners' chronological age on perceiving L2 Spanish phonetic and phonological contrasts has received less attention.Gender is another biological factor, akin to age, that may potentially shape perceptual abilities.Females demonstrate greater accuracy in identifying the prominence of English stressed words compared to males [7].However, this distinction may be limited to semantic processing and potentially not extend to other processing types [15].Future research should therefore examine the role of gender in di-verse perceptual tasks like Spanish intonation identification, providing insights into the complex interplay between gender, language context, and sentence modality while elucidating cross-linguistic gender-based perceptual learning strategies.
Psychometric measurements also offer interpretations of individual perceptual differences.Previous studies have linked "autistic traits" per the Autism Spectrum Quotient (AQ) to interspeaker variation in prominence identification [7,[44][45][46].For example, English listeners with higher AQ scores (thus poorer pragmatic skills) exhibited lower sensitivity to prosodic manipulation of prenuclear prominence in online lexical processing [7,44], as well as lower perception of prominence in words parsed into phonologically weak positions [4].However, as with biological factors like age and gender, psychometric perceptual effects likely depend on the specific linguistic categories being distinguished.
Finally, musical expertise has been theorized to enhance speech perception abilities.For instance, multiple studies have shown that musicians have greater sensitivity to changes in relative pitch structures and melodic contours in both native [47,48] and non-native languages [49,50].Similarly, individuals with music training experience were found to exhibit better L2 pronunciation [51] and excel at discriminating various speech features including phonemic vowel length contrasts [52] and lexical tone variations [53,54].However, musical advantages in perception may only apply to auditory tasks relying on the same type of skills developed through long-term music practice, like pitch and duration processing [55][56][57].Further research is necessary to determine if L2 learners with advanced musical abilities demonstrate an improved perception of Spanish sentential contrast, given the importance of pitch and melodic processing in music and intonation.

The present study
Building on earlier findings, speech perception emerges as a multifaceted process shaped by factors across dimensions.Comprehensively understanding tonal language speakers' perception of Spanish sentences necessitates considering both prosodic and individual factors broadly.The goals of this study are to (1) assess the auditory performance of Mandarin learners of Spanish in perceiving broad focus statements and information-seeking yes/no questions, and (2) examine the role of diverse factors in processing L2 sentence modalities.Specifically, we investigate two underexplored research questions for the Mandarin-Spanish language pair: RQ1: How do intonation contours and final stress placement affect native Mandarin speakers' perception of Spanish sentence modalities?RQ2: Is Mandarin speakers' processing of Spanish question-statement contrasts modulated by individual characteristics including L2 proficiency, age, gender, pragmatic skill, and musical ability?
To address these questions, we constructed a perceptual test drawing upon the gating paradigm as utilized in prior studies [16,28,52].The subsequent sections describe the process of preparing the auditory stimuli and administering the online experiment.

Ethics statement
The study was conducted online using an electronic survey.The introductory page of the survey detailed the participation requirements, experimental procedures, and data privacy policy.By clicking 'I agree and next' at the bottom of this page, participants provided documented confirmation of informed consent, affirming they met the over 18 age criteria and comprehended the instructions.The participation was entirely voluntary, and participants could stop and withdraw at any time.As this study collected data online without physical harm to the participants and anonymized the data, an exemption from full ethics review was granted by the Ethics Committee of Beijing Institute of Technology (approval code: BIT-EC-H-2023132).

Participants
The study included a total of seventy-five native Mandarin speakers (M age = 25.09,SD = 3.76, Min age = 18, Max age = 35), comprising 28% men and 72% women.Participants were compensated monetarily for their participation in the experiment.All participants were born in mainland China and reported being predominantly exposed to Peninsular Spanish during their language learning process.On average, they began learning L2 Spanish at the age of 19.73 years (SD = 3.00).The participants were divided into three language groups based on their proficiency level in Spanish: B1 (N = 22), B2 (N = 27), and C1 (N = 26).While approximately 60% of the participants' L2 proficiency was assessed using their recent Spanish DELE (Diploma of Spanish as a Foreign Language) certificate, the remaining participants were asked to self-evaluate their Spanish proficiency using the six reference levels in the Common European Framework of Reference for Languages (CEFR).Fig 1 displays the distribution of age and gender in each language group.None of the participants reported any history of hearing or communication problems during testing.
Given the limited number of subjects meeting specific selection criteria and the relatively weaker predictive value of factors such as participants' place of origin in China, L2 immersion status, and length of exposure to the target language environment for L2 phonological and phonetic accuracy [58,59], the present study did not strictly control for these variables.However, Table 1 has provided this information to complete the sociodemographic profiles of the participants.

Materials
Four pairs of broad focus statements and information-seeking yes/no questions were used to create stimuli.Each pair consisted of a morpho-syntactically identical statement and yes/no question (see Table 2).The four pairs differed in length and therefore in the number of stressed words.For each length, we included two stress patterns for the last word: penultimate stressed syllable (paroxytone) and final stressed syllable (oxytone).Our study utilized the Discourse Completion Task (DCT) [60] method to elicit the semi-spontaneous production of two distinct intonation types, each fulfilling a specific linguistic function: broad focus statements and information-seeking yes/no questions.To supplement the dialogues' situational framework, we integrated visual stimuli created through OpenAI's DALL-E, an advanced AI image generation tool [61].Each scenario within our dialogue series was paired with an image, generated from a descriptive prompt processed by DALL-E (see S1 Appendix for details of the image generation prompts), to visually underscore the context.These images were assessed by the authors to confirm their representational accuracy and unbiased nature.Additionally, in our study, dialogues were consistently initiated by an interlocutor known to the native Spanish speaker, a methodological decision aimed at mitigating potential biases due to power dynamics and social distance [62].
The individual producing the speech material was a 31-year-old female native of Spain, who lived and was educated in urban Cantabria until reaching adulthood.Her intonation represents the standard pattern of Cantabria Spanish, which is consistent with Castilian intonation and epitomizes the most generalized variant of Peninsular Spanish [63][64][65].Specifically, her speech production is characterized by a falling contour in broad focus statements and a rising contour in information-seeking yes/no questions.It is crucial, moreover, to distinguish these standard patterns from the traditional variety of Cantabrian Spanish, mainly preserved in the rural areas of northwest Peninsula and marked by a contrasting final falling contour (H* HL%) in yes/no questions [63,64].In this study, we did not include this rural pattern because it diverges considerably from most Spanish varieties and is often not included in L2 Spanish instructional materials and settings.The rising pattern for yes/no questions, being more prototypical and widely recognized, was therefore selected as the basis for our stimulus creation, reflecting the intonation most Mandarin learners are likely to be familiar with.For a detailed

Stimulus creation
The recordings described in the Materials section were used to generate stimuli within a gating paradigm.We divided the recorded sentences into several fragments, also referred to as gates, based on intonational events rather than segmental information, which contain intonational differences between the two sentence modalities.As a result, the gates we created align with established intonational events in Spanish phonology, and sentences containing one and two stressed words result in different numbers of gates due to this methodology.Moreover, it is crucial to indicate that, unlike some common practices in the field of auditory gating experiments, which involve manipulating the intonation contours through adjustments to pitch height [19,31,66], our procedure focused solely on segmenting the utterances without any modification to the acoustic correlates.This strategy ensured that listeners were gradually exposed to the sentences' natural speech pattern, preserving the authentic prosodic characteristics of the spoken language.
For sentences with two stressed words, Gate 1 contained the intonational contour of the first stressed syllable Vie, where the statement had a slight rise starting near the beginning of the stressed syllable, while the question did not (see Fig 2).Gate 2 consisted of the initial f0 peak, which may sometimes include an adjacent syllable of the following word due to synalepha.This peak was reported to be the first strong differential cue in determining the type of long sentences because questions have a different prenuclear accent than statements in Spanish [19,29].The difference can also be seen in Fig 2 where the yes/no question has a 71 Hz higher f0 peak (335 Hz) than the statement (264 Hz).Gate 3 begins after the first peak and ends before the utterance-final syllable.Therefore, for sentences ending with a paroxytone word, gate 3 included the contour of the final stressed syllable, which unequivocally differentiates the question (rising contour) from the statement (low-falling contour).For sentences ending with an oxytone word, gate 3 did not contain a stressed syllable.This means that gate 3 had more cues for correct recognition in final paroxytone sentences than in oxytone ones.However, we kept this division to have the same number of gates for both stress types and to explore the effect of the final stressed syllable on intonation categorization.The final gate (gate 4) of sentences with two stressed words contained the intonation contour of the last syllable, which is traditionally the last but most typical cue for signaling the question-statement contrast.
For sentences comprised of a single stressed word, we created two gates, positioning the division boundary just before the final syllable (see Fig 3).As a result, the prosodic configuration of gate 1 in these single-word sentences exhibited subtle variations based on the stress pattern.In the case of paroxytone words, gate 1 included the contour of the nuclear stressed syllable, which usually yields a relatively high-rising pitch for statements and a low-falling pitch for yes/no questions.Conversely, for oxytone words, we intentionally segmented gate 1 to exclude the nuclear stressed syllable, establishing a distinct perceptual contrast between the two types of stress patterns.Gate 2, the final gate, consisted of the intonation contour of the entire sentence, encompassing the typical final pitch movements observed in both statements and questions.The segmentation of speech was performed using Praat [67].To create the final stimuli for the experiment, we progressively compiled the gates, starting with the first gate alone, followed by a combination of the first and second gates, and continuing in this manner until the entire utterance was incorporated.This process resulted in a total of 24 distinct stimuli for the experiment.Each stimulus was saved as a separate sound file, with an added 500 milliseconds of silence both preceding and following the spoken content.Audio clips for the stimuli generated in this study are available in the S2 Appendix.

Procedure
The Alchemer survey platform (https://www.alchemer.com/)was employed for data collection.The survey comprised four parts.The first was designed to gather sociolinguistic information including the participant's origin, age, gender, education, and linguistic background such as L2 proficiency, L2 immersion status, exposure time in Spain, and so forth.
The second part focused on measuring participants' musical ability, which pertains to their sensitivity in perceiving various auditory features such as pitch, melody, tuning, rhythm, and tempo.Particularly, this study aims to measure participants' perceptual musical abilities to pitch and melody, which are considered primary phonetic properties of sentential intonation in Spanish [68].To achieve this, we employed two specific tasks: the pitch and melody subsets from the PROMS (Profile of Music Perception Skills) test [69].During both tasks, participants were presented with audio pairs of different complexities via earphones and were required to rate the similarity of the test sound to the reference audio clip using a five-point scale (definitely the same, probably the same, I don't know, probably different, definitely different).The final scores for each participant were automatically generated and subsequently gathered by the first author.
The third part aimed to assess participants' pragmatic skills and included items from the Autism Spectrum Quotient (AQ) questionnaire.The complete AQ contained five subscales: social skills, communication, attention switching, attention to details, and imagination [70].In this study, we selected the 10 tokens from the communication subscale, which have been considered a rough proxy for speakers' pragmatic ability to engage the prosody in a specific context [7].Participants' responses to these items were evaluated with a four-point scale (strongly agree, slightly agree, slightly disagree, strongly disagree) and their total AQ scores were calculated by summing the score for each.Higher scores on the AQ indicated more "autistic-like" traits and therefore were considered to reflect poorer pragmatic skills [71].The original English AQ items were translated into Chinese based on prior work [72].
Finally, the fourth part presented the stimuli for the perceptual test, during which participants were instructed to listen to each of them using earphones in a quiet room.Before beginning the formal task, participants completed a practice trial to familiarize themselves with the procedure.The full text of the audio was displayed on the screen without punctuation marks.The task of the participants was to identify whether the stimulus they heard was a "Statement" or a "Yes/no question" by clicking the appropriate button on the screen.The perceptual choices and stimuli were presented in a random order to avoid listeners' response bias.

Statistical analysis
Before statistical modeling, compound musical ability scores were computed for each participant by averaging the pitch and melody task scores.Then, we performed a two-step cluster analysis of these scores to automatically classify the participants into high and low musical ability levels.Generalized linear models (GLMs) subsequently were used to analyze the dataset, with perceptual accuracy (true vs false) as the dependent variable operationalized by participant responses to each stimulus.The GLMs incorporated binary variables including gender (male vs. female), sentence modality of the stimulus (statement vs. yes/no question), stress position of the final word (final vs. penultimate), and musical ability (high vs. low).Discrete ordinal variables included L2 proficiency level (B1 < B2 < C1) and gate (1 < 2 < 3 < 4 for longer sentences; 1 < 2 for shorter sentences).Meanwhile, age, AQ score, and stimulus order were treated as continuous factors and standardized via z-transformation prior to model inclusion.
Two multivariable GLMs were constructed in this study, one for sentences with one stressed word and one for sentences with two stressed words.Stepwise backward elimination was performed using the step AIC function from the MASS package to determine the best-performing models [73].Additionally, EMMs objects were generated with the contrast function to analyze model interactions [74].The R software environment was utilized for all statistical analyses [75].

Results for sentences with two stressed words
The significance test of the first GLM revealed that Mandarin learners' accuracy in perceiving sentences with two stressed words was significantly influenced by the interaction between sentence modality and gate [χ 2 (3) = 24.25,p < .0001],L2 proficiency [χ 2 (2) = 14.09, p < .001],pragmatic skill [χ 2 (1) = 31.19,p < .0001)],and musical ability [χ 2 (1) = 6.04, p < .05].However, stress position, age, and gender did not significantly impact auditory evaluation of L2 sentences.Regarding the interaction of sentence modality with gate, Table 3 shows that Mandarin learners performed significantly better identifying L2 statements than yes/no questions without salient interrogative cues.Particularly, the contrast coefficient, transformed into odds ratios (OR), indicated that statements were 4.03 times more likely to be correctly recognized by Mandarin learners in gate 1 relative to yes/no questions.
Table 3 also reveals that Mandarin learners' ability to perceive Spanish sentential contrasts is associated with their proficiency level in L2.As depicted in Fig 4, learners with higher Table 3. Results of the regression model fitted to sentences with two stressed words.(The estimated coefficients and confidence intervals were transformed from logodds to odds ratios).proficiency levels (e.g., B2 and C1) displayed significantly greater accuracy in distinguishing L2 sentence modalities compared to those with a B1 level.Although the interaction between L2 proficiency and gate did not reach statistical significance, multiple comparisons revealed distinct auditory performance among the three language groups after exposure to each of the four gates.Specifically, from gate 1 to gate 2, both B2 (z = 3.08, p < .05)and C1 learners (z = 3.54, p < .01)exhibited significant progress in perceptual accuracy, while B1 learners did not improve evidently after perceiving the initial f0 peak (gate 2).B1 and B2 learners then significantly increased perceptual accuracy (both ps < .05)after being presented with the f0 downslope after the highest peak (gate 3).Finally, the three L2 groups neared 100% accuracy upon the release of the final f0 contour, but only B1 learners significantly improved after listening to the last gate.

Predictors
Additionally, the results presented in Fig 5A show a robust negative correlation between the pragmatic skills of Mandarin learners, as measured by the AQ score, and their ability to accurately associate intonation patterns with sentence modalities in L2 Spanish.This association is further supported by the coefficient of the AQ variable, as indicated in Table 3, which suggests that for every one standard deviation increase in the AQ score, the likelihood of correctly identifying L2 Spanish sentence types decreases by 37%.Further, the results depicted in Fig 5B indicate that Mandarin learners with higher levels of musical ability performed significantly better in perceiving L2 sentential contrasts.Specifically, individuals with high musical ability levels exhibit a 56% higher likelihood of accurately distinguishing Spanish sentence modalities compared to those with lower musical ability levels.
Table 4 shows that statements were significantly more likely to be correctly identified than yes/no questions, particularly in gate 1 (see Fig 6).All three language groups exhibited a significant increase in perceptual accuracy from gate 1 to gate 2. Likewise, the effect of stress position indicated that paroxytone words were easier for learners to perceive compared to oxytone words.This distinction was primarily observed in gate 1 of the two types of stressed words (z = -2.37,p < .05),while in gate 2, the difference did not attain statistical significance (z = 0.57, p > .1).
Furthermore, results in Table 4 showed that the impact of pragmatic skills on perceiving one-word sentences aligned with previous findings for longer sentences, with a 43% lower likelihood of perceptual accuracy per standard deviation increase in AQ score (Table 4).Similarly, learners with higher musical ability were found to relate to 1.88 times greater accuracy in identifying Spanish sentence modality versus those with inferior musical ability.Fig 6 also illustrates that Mandarin learners were able to enhance their identification rates as L2 proficiency improved.However, this enhancement was not statistically significant in perceiving sentences with one stressed word, which seems to contradict previous results for longer sentences.This Table 4. Results of the regression model fitted to sentences with one stressed word.(The estimated coefficients and confidence intervals were transformed from logodds to odds ratios).discrepancy likely stems from the distribution of intonation cues across one-word stimuli.Specifically, in gate 1, all three L2 groups displayed similarly low perceptual accuracy due to the lack of robust cues that clearly indicated sentence modality.In contrast, gate 2 provided the salient cue of the final pitch contour universally signaling question-statement contrast, enabling perception improvements regardless of Mandarin learners' proficiency in Spanish.

Discussion
The current study formulated two research questions aimed at exploring the cross-linguistic perception of Spanish sentence modalities by Mandarin-speaking listeners.The first focused on examining the influence of prosodic components, including intonation contours and the final stress pattern, on perceiving Spanish broad focus statements and information-seeking yes/no questions.The results demonstrate the pivotal role of prosodic elements in determining question-statement identification in a cross-linguistic context.However, the efficacy of these two factors appears to vary based on sentence modality, L2 proficiency, and utterance length.Particularly, our study found that Mandarin learners were more inclined to perceive an utterance as a statement in gate 1 where no salient interrogative cues were present.This observation aligns with prior studies involving native Spanish listeners [19,25,31,76], suggesting a universal inclination towards unmarked linguistic forms in communication.Declaratives are generally considered the most neutral and unmarked sentence type, carrying the least communicative load across languages including Spanish, Basque, and various Germanic languages [77][78][79].Consequently, their default manifestation in the form of statements would be more readily processed and understood than yes/no questions by Mandarin learners of Spanish.This propensity underscores the cognitive efficiency of utilizing unmarked structures in language comprehension and acquisition.Our results also reveal that all intonational cues distributed across the utterance contributed somewhat to identifying Spanish sentences, but the ability to accurately recognize these cues depended on learners' L2 proficiency.Highly proficient Mandarin learners at the B2 and C1 levels exhibited significant improvements in identifying sentences by utilizing the initial f0 peak, while B1 learners struggled to use this information to signal question-statement contrasts.Similar challenges in L2 Spanish perception were reported in past work by Trimble [76] and Li [25], potentially stemming from insufficient experience with target prosody.Moreover, the results showed that learners at the lower proficiency level (B1) relied primarily on the f0 downslope of the nuclear syllable and final pitch movement to identify sentences containing two stressed words.In contrast, the final intonational pattern did not seem to significantly impact the identification of longer sentences for learners at the C1 level.However, further analysis using single-word utterances revealed the final pitch shift as the most reliable cue for all L2 learners in perceiving question-statement contrasts, once acoustic interference from the initial f0 peak and the nuclear syllable contour was excluded.These findings are consistent with previous research, which reported the final pitch contour as the strongest cue in discriminating Spanish sentence modality, capable of changing native listeners' perceptual decisions made with earlier contradictory cues [19].The consistency in auditory weighting for the final intonational contour between Spanish L1 and Mandarin L2 speakers suggests potential universal effects on specific intonational forms and their paralinguistic meanings in speech communication [27,80,81].
Our research further revealed that the impact of the final stress position on the perception of L2 Spanish sentence modalities is modulated by utterance length.Notably, in short sentences, we observed a significantly higher perceptual accuracy for penultimate stressed words than final stressed words, particularly when identified within gate 1.This disparity is mainly ascribed to the pitch movement contrasts present in gate 1 of the penultimate stressed words, wherein the nuclear stressed syllable typically displayed a high-rising contour in statements and a divergent low-falling contour in yes/no questions.Our data suggests that Mandarin learners were sensitive to these phonetic distinctions, utilizing them as an important cue to discern target sentence modalities, thereby improving the perception of paroxytone words.In contrast, long sentences containing two stressed words and ending with a paroxytone manifested a uniform low-falling contour on the final stressed syllable across both statements and yes/no questions, eliminating a distinctive phonetic feature that could aid in long sentence identification.Consequently, we observed no significant variance in perceptual accuracy for long sentences ending with diverse stress patterns.These findings lend support to prior research which asserts that the pitch contour of the final stressed syllable is a weaker cue for the discrimination of long sentences in Spanish, yet it remains pivotal for assessing the naturalness of interrogative sentences [31].
Regarding the second research question, our results indicated that individual characteristics such as L2 proficiency, pragmatic skills, and musical ability can fine-tune Mandarin learners' auditory processing and cognitive abilities, thereby impacting their perceptual performance in L2.Highly proficient Mandarin learners were more accurate at identifying Spanish statements and questions, especially pitch-contrastive structures not present in their L1 tonal categories, e.g., the initial f0 peak.This finding corroborates previous studies linking enhanced L2 proficiency to improved cross-linguistic perception, including speech comprehensibility [82] and speech recognition in noise [83].A potential explanation is that high proficiency correlates with increased activity in later-developing brain regions related to higher-order cognition like attention, which may aid in processing non-native speech objects [14].
Another individual feature significantly affecting L2 sentence perception is the pragmatic skill.Our study found that Mandarin learners with poorer pragmatic skills, as indexed by higher AQ scores, were less adept at recognizing the mapping relationship between intonational patterns and linguistic meanings in specific contexts.This aligns with previous research on psychometric properties in speech production and perception [7,[44][45][46], which, for example, found that English listeners with lower pragmatic skills had greater difficulty perceiving prominence patterns in their L1 as well.Although the precise mechanisms underlying individual variations in pragmatic ability remain unclear, there appears to be a relationship between pragmatics and prosodic sensitivity.
The last individual factor important to the perception of L2 sentence modality consists of musical ability.Our results showed that Mandarin learners with strong musical abilities performed better in perceptual processing of Spanish question-statement contrasts compared to those with low musical abilities.This corroborates previous work on music-language interaction, which demonstrates that high levels of music performance skills can facilitate speakers' learning process in multiple aspects of speech perception, including lexical tones [18] and melodic contours [47].This advantage likely stems from a positive transfer of auditory skills developed through music to speech perception, since both domains rely on shared acoustic and neural resources for processing speech signals [56,84].Notably, however, we did not find significant effects of biological factors like age and gender on L2 perception of Spanish sentence modality.

Conclusions
The present study examines the factors impacting native Mandarin speakers' perception of Spanish sentence modalities.While prior cross-linguistic research between tonal and non-tonal languages has predominantly focused on the role of L1, our results demonstrate that L2 sentence perception entails a more intricate integration mechanism that requires integrating both prosodic features and individual traits.This enables a more comprehensive and precise assessment of L2 perceptual learning outcomes.Furthermore, our findings suggest that the development of sentence type perception toward a more native-like system depends on multifaceted factors, including prosodic awareness, L2 proficiency level, pragmatic skills, and musical ability, potentially acquired through natural linguistic environments or specialized target training.Overall, this study offers valuable insights into the perception of Spanish sentence modalities within a cross-linguistic context and furnishes foundational data that can inform subsequent research aimed at devising training methodologies and techniques for L2 Spanish intonation.
However, some limitations should be acknowledged.First, the gender imbalance among participants may affect the generalizability of our results; hence, future studies should aim for a more balanced representation.Second, the current study's scope, limited by the range of sentence modalities and the reliance on a single speaker's speech materials, may not fully capture the spectrum of Spanish intonation patterns.Future research should endeavor to broaden the corpus to include a wide array of native speakers, particularly from the central Castilian region.Such an approach would yield a more comprehensive and balanced investigation that carefully considers both the richness and representativeness of intonation data, alongside the diverse qualities of L2 learners.Third, while our findings suggest that the location of the final stressed syllable affects the L2 sentence perception, the intricate process of how Mandarin learners of Spanish parallelly perceive and decode stress and intonation, particularly when these prosodic features are conveyed through shared acoustic dimensions, remains an area for additional exploration.Moreover, the 10-item AQ communication subscale may not have captured the full complexity of pragmatic competence.Future studies should consider employing a more nuanced array of assessment tools, possibly with multiple raters, to better evaluate this aspect.Lastly, the potential influence of multilingual backgrounds on L2 perception was not controlled for in this study, which is a limitation considering the impact that proficiency in additional languages could have.We plan to make this a key consideration in the design of our future research to ensure a more thorough analysis of the factors affecting L2 sentence perception.

Fig 2 .Fig 3 .
Fig 2. Intonation contours of the sentence pair with two stressed words aligned with syllables and gates.(S1, gate 1 of the statement; Q1, gate 1 of the yes/ no question; and so on for the rest of the abbreviations).https://doi.org/10.1371/journal.pone.0298708.g002

Table 1 . Mandarin participants' immersion status and exposure length in Spain. Immersion status in Spain Exposure time in Spain
https://doi.org/10.1371/journal.pone.0298708.t001overview of the elicitation procedure and the sample dialogues, refer to the S1 Appendix.The recordings were conducted in a speech lab using a Rode Smartlav+ microphone connected to a Scarlett interface.The audio files were digitized at a 44.1 kHz sampling rate with 16-bit quantization precision.

Table 2 . Categorization of the recorded utterances used to create stimuli. Sentence modality No. of stressed words Stress pattern of the last word Recordings and English translations
Alcala ´is the name of a city in the autonomous community of Madrid, Spain, and is pronounced [alkaˈla] in Spanish.bSevilla is the name of the largest city in the autonomous community of Andalusia, Spain, and is pronounced [seˈβiʎa] in Spanish.https://doi.org/10.1371/journal.pone.0298708.t002 a